perm filename A51.TEX[106,PHY] blob
sn#807749 filedate 1985-11-21 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00002 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 \magnification\magstephalf
C00010 ENDMK
C⊗;
\magnification\magstephalf
\input macro.tex
\def\today{\ifcase\month\or
January\or February\or March\or April\or May\or June\or
July\or August\or September\or October\or November\or December\fi
\space\number\day, \number\year}
\baselineskip 14pt
\rm
\line{\sevenrm a51.tex[106,phy] \today\hfill}
\bigskip
\noindent
{\bf Sorting/Programming Economics.}
I want to write a program to read {\tt N} numbers, and put them into
array~{\tt A} in
increasing order. In practice, such a program would more likely be reading
strings and using alphabetical order, but using numbers keeps the algorithm
simple. If the numbers arrive in the order
$$\hbox{388 909 476 820 245 371}$$
the successive contents of the variables used by the program will be
$$\vcenter{\halign{$\ctr{#}$\qquad&$\ctr{#}$\qquad%
&$\ctr{#}$\qquad&$\ctr{#}$\qquad%
&$\ctr{#}$\qquad&$\ctr{#}$\qquad&$\ctr{#}$\cr
{\tt I}&{\tt A[1]}&{\tt A[2]}&{\tt A[3]}&{\tt A[4]}&{\tt A[5]}&{\tt A[6]}\cr
\noalign{\smallskip}
\noalign{\hrule}
\noalign{\smallskip}
0&?&?&?&?&?&?\cr
1&388&?&?&?&?&?\cr
2&388&909&?&?&?&?\cr
3&388&476&909&?&?&?\cr
4&388&476&820&909&?&?\cr
5&245&388&476&820&909&?\cr
6&245&371&388&476&820&909\cr}}$$
Each successive arrival is inserted into the array by moving the previous
numbers to the right, starting with the rightmost ones, until a number is
found which is not larger than the new one. In Pascal:
\smallskip
{\obeylines\obeyspaces\let =\ \tt
FOR I:=1 TO N DO
BEGIN
READ(NEWDATUM);
J:=I; (* CANDIDATE LOCATION FOR NEWDATUM *)
WHILE(NEWDATUM<A[J-1]) AND (J>1) DO
BEGIN
A[J]:=A[J-1];
J:=J-1;
END;
A[J]:=NEWDATUM
END
}
\smallskip\noindent
Drill: In the above algorithm, nothing is ever placed in {\tt A[0]},
but that variable must exist. Why? [RWF: Explain elsewhere.]
The inner iteration can be simplified slightly by keeping
in {\tt A[0]} a~number that
is not greater than the new datum. In effect, the small number in
{\tt A[0]} serves as a sentinel to stop decreasing {\tt J}
so that the test for {\tt J>1}
is no longer needed. The above algorithm is preceded by {\tt A[0]:=SMALL}, where
{\tt SMALL} is the smallest possible constant, and the inner iterative clause
becomes {\tt WHILE NEWDATUM < A[J-1] DO}.
The sorting method given above is called sorting by insertion. It is a
very good algorithm if {\tt N} is no more than 10 or~15. If a program uses it
many times with a large value of~{\tt N}, it may be worthwhile to use a more
efficient algorithm. To make the least expensive choice, you need to
balance the cost of the wasted computer time against the cost of the
programming time to design and debug a better algorithm.
When {\tt N} is large, other algorithms can sort many times faster (more
than ten times faster if {\tt N=1000}), so almost the entire running time is
wasted. The insertion sort algorithm, on the average, moves the average
datum half way through a sequence of {\tt N/2} previous data, so the number of
times through the inner iteration is about ${\tt N}\times 1/2\times
{\tt N}/2 = {\tt N}↑2/4$. There
are about ten machine operations in the inner iteration, each taking up
about $10↑{-6}$ seconds on my computer, for a total time of $10↑{-5}\times n↑2/4$
seconds. If computer time costs \$0.25 per second, and {\tt N} is~5000, the cost
is about \$15 for each sorting job. Suppose that
a better algorithm requires
two hours work for a programmer paid \$25 per hour.
A~better algorithm should then be used if the sorting will be done four times or
more. (Remember, in estimating programmer time, to include debugging and
documentation as well as time to find or design the algorithm).
The typical reader of this book is a student, working problems that
use insignificant amounts of computer time, so the emphasis here is on
simplicity and reliability rather than efficiency.
A professional approach to programming, though, includes awareness of the
tradeoffs between labor and machine costs, and willingness to investigate
the extensive research results on program efficiency. For our example of
sorting, Knuth's {\sl Sorting and Searching\/} is a definitive study. The
program libraries for many computers include excellent sorting programs.
\bigskip
\parindent0pt
\copyright 1984 Robert W. Floyd
First draft April 4, 1984
\bye